Introduction to Python and Google Colab

Rob Willans

Welcome and Agenda

  • Introductions
  • Today’s Session
    • Morning
      • Intro to Python and Google Colab
      • Importing and manipulating data with Pandas
    • Afternoon
      • Visualisations with Matplotlib
      • Where to go next
  • Will finish 4pm at absolute latest, including breaks

A very brief overview of Python

Why Python for Data Science?

  • Popularity and community support.
  • Extensive libraries and frameworks (e.g., pandas, numpy, matplotlib, scikit-learn).
  • Integration with other technologies and tools.

NHS Python Community

https://nhs-pycom.net

Understanding IDEs

  • Integrated Development Environments (IDEs) make writing and working with code much easier
  • Notebook IDEs are IDEs that follow a ‘book’ type format
    • Key features: Code cells, markdown cells, inline outputs

Introduction to Google Colab

  • Notebook IDE that works with Google Workspace
  • Primarily designed for Python
  • Don’t have to worry about environments, package installs, hardware
  • Also have access to Gemini

Getting started in Colab

  • Python as calculator!
  • Try writing some markdown text!
  • Write “hello world”
  • Assign a value to an object
# Hello world
print("Hello world!")
Hello world!
# Assignment
an_object = "some text" 
print(an_object)
some text

Basic Data Types in Python

  • Strings: Text data.
patient_name = "Alice"
  • Numeric: Integer and float types.
patient_age = 30  # Integer values for 'whole number'
patient_weight = 70.5  # Float values for fractional values
  • Boolean: True or False values.
is_discharged = True

Python Data Structures

  • Lists: Mutable collections of items.
patient_ages = [25, 30, 45, 50, 60]
  • Tuples: Immutable collections of items.
patient_info = ('Joe Bloggs', 45, 'A+') 
  • Dictionaries: Mutable collections of key-value pairs.
patient_record = {'name': 'Methuselah', 'age': 369, 'diagnosis': 'Frailty'} 

Indexing in Python

  • Python starts indexing from 0
  • Therefore the first item in a list is accessed with zero
  • Contrast with R which indexes from 1
# Python
patient_ages = [25, 30, 45, 50, 60]
patient_ages[0] 
25
# R
patient_ages <- c(25, 30, 45, 50, 60)
patient_ages[1] 
[1] 25

LLMs for code

  • New, but very powerful
  • Don’t ask it to do the whole job
    • It will not know the data structure, and you will not learn
  • If you want to use it, ask it to write small bits of code you can test, and explain it
  • Prompt with context
  • Ask it to explain code to you
  • Ask it what the problem is when things go wrong

Code libraries

  • Code libraries are prepackaged objects, methods, functions for specific tasks
  • We will look at Pandas for manipulating data and Matplotlib for data visualisation
  • When working in your own environment, need to get these from pip installer or conda package management
  • Happily, Colab means we don’t have to set these up
  • Practice importing pandas (as pd) and numpy (as np)

Any Questions

Any questions, thoughts, errors you got, or things you want demonstrated again?